Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[helm-chart] add values for setting annotations and labels for rollout-operator #6733

Open
wants to merge 10 commits into
base: main
Choose a base branch
from

Conversation

m1rp
Copy link

@m1rp m1rp commented Nov 24, 2023

What this PR does

add labels and annotations used by the rollout-operator.

Which issue(s) this PR fixes or relates to

Fixes #6650
Also very related to This PR over in the rollout-operator repo

Checklist

  • Tests updated.
  • Documentation added.
  • CHANGELOG.md updated - the order of entries should be [CHANGE], [FEATURE], [ENHANCEMENT], [BUGFIX].
  • about-versioning.md updated with experimental features.

Extra

i'd love some feedback on this, because not all possible annotation/labels have been added nor are they configurable so some input would be appreciated if this is something worth investing more time into.

@CLAassistant
Copy link

CLAassistant commented Nov 24, 2023

CLA assistant check
All committers have signed the CLA.

@m1rp
Copy link
Author

m1rp commented Nov 24, 2023

@56quarters tagging you here as you have some context for these changes, as you're also tagged here and i think it makes sense for these things to be aligned.

@m1rp m1rp force-pushed the rollout-operator-config branch 2 times, most recently from 7f9e087 to 840edc0 Compare November 24, 2023 14:58
Copy link
Contributor

@56quarters 56quarters left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this!

@@ -53,11 +53,20 @@ spec:
metadata:
labels:
{{- include "mimir.podLabels" $args | nindent 8 }}
grafana.com/no-downscale: {{$rolloutZone.noDownscale}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think this is needed for the alertmanager, I'm pretty sure they're safe to scale down at any time. Could one of the alerting folks confirm? @stevesg et al

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might also be the case for the store-gateway right?
the resource that needs extra safety when scaling down is mainly the ingester if i'm not mistaken

@@ -52,8 +52,17 @@ spec:
metadata:
labels:
{{- include "mimir.podLabels" $args | nindent 8 }}
grafana.com/no-downscale: {{$rolloutZone.noDownscale}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These two annotations are mutually exclusive:

  • A statefulset can either be "no-downscale" in which case, it's not allowed to be scaled down by anything.
  • Or it can be scaled down by the rollout-operator

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make sense that the no-downscale is the deciding factor here?
by that i mean that the condition for rendering the prepare-scaledown is that it's value is true and that no-downscaleis false?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dimitarvdimitrov has suggested that no downscale be the default behavior which slightly simplifies this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry, I was interfering. I think having the prepare-scaledown and graceful scaledown procedure by the rollout operator are easier to reason about from an operator's POV - you just scale the StatfulSets as you would normally and the rollout-operator sorts out preparations and wait times.

But I suspect Nick's point was that setting both doesn't make sense. So it would be useful if we add some validations that they both cannot be set. Perhaps you can add a named template (helm function) to this file which checks the values of mimir.podLabels and mimir.podAnnotations and fails the chart rendering when both are set.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh i see, i didn't know about this possibility!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

sorry I didn't make this clear - I think it's better if we have the prepare-downscale and the downscale leader on by default. This would remove some toil from operators as opposed to only preventing accidental downscaling.

grafana.com/prepare-downscale-http-port: 80
{{- end-}}
{{- if $rolloutZone.downscaleLeader }}
grafana.com/rollout-downscale-leader: downscaleLeader
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The value of this should be the name of the leader statefulset. For example, the leader of the zone B statefulset should be mimir-ingester-zone-a.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

my mistake, i meant to add this as a variable.

annotations:
{{- include "mimir.podAnnotations" $args | nindent 8 }}
{{- if $rolloutZone.prepareDownscale }}
grafana.com/prepare-downscale-http-path: ingester/prepare-shutdown
grafana.com/prepare-downscale-http-port: 80
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't remember if the default HTTP port for Mimir is changed in the helm chart. This might be 8080 these days.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's dynamic based on {{ include "mimir.serverHttpListenPort" . }}. It's set in the same file. I think we should use the same template here

Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the contribution! Can you also add a changelog entry in operations/helm/charts/mimir-distributed/CHANGELOG.md

@@ -950,6 +974,14 @@ ingester:
# If set to "-", then use `storageClassName: ""`, which disables dynamic provisioning
# If undefined or set to null (the default), then fall back to the value of `ingester.persistentVolume.storageClass`.
storageClass: null
# -- noDownscale adds a label that can be used by the rollout-operator as documented in the rollout-operator repo: https://github.com/grafana/rollout-operator#webhooks
noDownscale: false
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should we enable this by default in all ingester zones? The idea behind this annotation was to prevent accidental downscaling

annotations:
{{- include "mimir.podAnnotations" $args | nindent 8 }}
{{- if $rolloutZone.prepareDownscale }}
grafana.com/prepare-downscale-http-path: ingester/prepare-shutdown
grafana.com/prepare-downscale-http-port: 80
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's dynamic based on {{ include "mimir.serverHttpListenPort" . }}. It's set in the same file. I think we should use the same template here

@m1rp m1rp force-pushed the rollout-operator-config branch from 840edc0 to 446ac64 Compare December 4, 2023 14:33
m1rp added 3 commits January 2, 2024 14:53
…t-operator

Signed-off-by: sam clulow <sam.clulow@springernature.com>
- default `no-downscale` to true
- downscale labels are now conditionally rendered
- fix path and port
Signed-off-by: sam clulow <sam.clulow@springernature.com>
@m1rp m1rp force-pushed the rollout-operator-config branch from a844db4 to 6ff0205 Compare January 2, 2024 13:57
@m1rp m1rp marked this pull request as ready for review January 4, 2024 13:42
@m1rp m1rp requested a review from a team as a code owner January 4, 2024 13:42
Comment on lines 542 to 544
"noDownscale": $rolloutZone.noDownscale
"downscaleLeader": $rolloutZone.downscaleLeader
"prepareDownscale": $rolloutZone.prepareDownscale
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think the linter is complaining about the colons here

@dimitarvdimitrov
Copy link
Contributor

hey, there's something off with the commits in this PR. I see quite a lot of changes and number of commits. Can you take another look please @m1rp?
Screenshot 2024-03-25 at 14 12 11

Signed-off-by: sam clulow <sam.clulow@springernature.com>
@m1rp m1rp force-pushed the rollout-operator-config branch from 4b4a6c4 to 2df4381 Compare March 25, 2024 13:14
@m1rp
Copy link
Author

m1rp commented Mar 25, 2024

apologies, i managed to spectacularly mess up my git history/commit! should be sorted now.

@dimitarvdimitrov
Copy link
Contributor

really sorry for the delay 🙈 I was away the week before and swiped away all GH notifications

I can try to take a look later this week after GrafanaCon. @56quarters can you give this one more look?

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
Copy link
Contributor

@dimitarvdimitrov dimitarvdimitrov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dropped a few long-overdue comments. Apologies once again for the delay

# -- downscaleLeader is an Annotation used by the rollout-operator
# If defined, downscaleLeader: <downscaleLeader>
# If undefined or set to null (the default), no Annotation is set
downscaleLeader: null
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be great if we can set the downscale leader automatically in the zonal templates. That way automatica downscaling can be just a single boolean toggle for the ingester/store-gateway which automatically sets up downscale leader and prepare downscale for the 3 zones.

But i think this PR has been lingering for long enough to try to slow it down further. happy to not include this change in the PR if you want.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

having said that - i think having noDownscale: false would be a safer change because it will mean folks can keep running the chart as they used to. If we get automatic downscaling automatically set up like I suggested above, then I think it's viable to have it on by default.

I think it makes sense to try to minimize the added manual effort for folks after they upgrade to a chart version which contains this PR

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think that changing noDownscale: false makes sense. and as you mentioned the PR has been lingering for a while and i just don't have the bandwidth to deal with it.

{{- if $rolloutZone.prepareDownscale }}
grafana.com/prepare-downscale-http-path: store-gateway/prepare-shutdown
grafana.com/prepare-downscale-http-port: {{ include "mimir.serverHttpListenPort" . }}
{{- end-}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i think this tag is invalid - it should be {{- end -}} (extra space) It's also invalid for the other two templates

{{- if (eq $rolloutZone.noDownscale true )}}
grafana.com/no-downscale: {{$rolloutZone.noDownscale}}
{{- else }}
grafana.com/prepare-downscale: {{$rolloutZone.prepareDownscale}}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need the annotation even if prepareDownscale is false? I think we can just not apply it

Comment on lines 67 to 73
{{- if $rolloutZone.prepareDownscale }}
grafana.com/prepare-downscale-http-path: ingester/prepare-shutdown
grafana.com/prepare-downscale-http-port: {{ include "mimir.serverHttpListenPort" . }}
{{- end-}}
{{- if $rolloutZone.downscaleLeader }}
grafana.com/rollout-downscale-leader: $rolloutZone.downscaleLeader
{{- end }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we should also set the grafana.com/min-time-between-zones-downscale=12h label on ingesters. store-gateways can work with about 30m (in case lazy loading is disabled). alertmanagers shouldn't need this config

Comment on lines 57 to 73
metadata:
labels:
{{- include "mimir.podLabels" $args | nindent 8 }}
{{- if (eq $rolloutZone.noDownscale true )}}
grafana.com/no-downscale: {{$rolloutZone.noDownscale}}
{{- else }}
grafana.com/prepare-downscale: {{$rolloutZone.prepareDownscale}}
{{- end }}
annotations:
{{- include "mimir.podAnnotations" $args | nindent 8 }}
{{- if $rolloutZone.prepareDownscale }}
grafana.com/prepare-downscale-http-path: store-gateway/prepare-shutdown
grafana.com/prepare-downscale-http-port: {{ include "mimir.serverHttpListenPort" . }}
{{- end-}}
{{- if $rolloutZone.downscaleLeader }}
grafana.com/rollout-downscale-leader: $rolloutZone.downscaleLeader
{{- end }}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

these annotations and labels should be set on the stateful set not on the pods themselves.

that's also what jsonnet does

statefulSet.mixin.metadata.withAnnotationsMixin({
'grafana.com/prepare-downscale-http-path': 'ingester/prepare-shutdown',
'grafana.com/prepare-downscale-http-port': '80',
}),

these are the rollouts operator docs https://github.com/grafana/rollout-operator?tab=readme-ov-file#how-scaling-up-and-down-works

Copy link
Author

@m1rp m1rp May 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is this true of the labels as well? nevermind i looked at the jsonnet and answered my own question i think

@dimitarvdimitrov
Copy link
Contributor

i also merged with upstream and did some trivial changes in the changelog and some comments in the values.yaml.

@zzehring zzehring removed the request for review from grafanabot October 1, 2024 15:10
@zerok zerok removed the request for review from a team October 2, 2024 09:45
@armandgrillet
Copy link
Contributor

@m1rp Thank you for the contribution, can you please do the changes mentioned in the comments? Otherwise, we will close this PR.

@chencs
Copy link
Contributor

chencs commented Dec 20, 2024

The CHANGELOG has just been cut to prepare for the next release. Please rebase main and eventually move the CHANGELOG entry added / updated in this PR to the top of the operations/helm/charts/mimir-distributed/CHANGELOG.md document. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

[helm] Add labels and annotations for rollout operator functionality
6 participants